JOPSS - Search Results

Search Results: Records 1-4 displayed on this page of 4

Presentation/Publication Type

Initialising ...

Refine

Journal/Book Title

Initialising ...

Meeting title

Initialising ...

First Author

Initialising ...

Keyword

Initialising ...

Language

Initialising ...

Publication Year

Initialising ...

Held year of conference

Initialising ...

Journal Articles

A Stencil framework to realize large-scale computations beyond device memory capacity on GPU supercomputers

Shimokawabe, Takashi*; Endo, Toshio*; Onodera, Naoyuki; Aoki, Takayuki*

Proceedings of 2017 IEEE International Conference on Cluster Computing (IEEE Cluster 2017) (Internet), p.525 - 529, 2017/09

Stencil-based applications such as CFD have succeeded in obtaining high performance on GPU supercomputers. The problem sizes of these applications are limited by the GPU device memory capacity, which is typically smaller than the host memory. On GPU supercomputers, a locality improvement technique using temporal blocking method with memory swapping between host and device enables large computation beyond the device memory capacity. Our high-productivity stencil framework automatically applies temporal blocking to boundary exchange required for stencil computation and supports automatic memory swapping provided by a MPI/CUDA wrapper library. The framework-based application for the airflow in an urban city maintains 80% performance even with the twice larger than the GPU memory capacity and have demonstrated good weak scalability on the TSUBAME 2.5 supercomputer.

Oral presentation

Development of locally mesh-refined Lattice Boltzmann Method by using Temporal Blocking Method

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*

no journal, ,

A real-time simulation of the environmental dynamics of radioactive substances is very important from the viewpoint of nuclear security. Since a lot of tall buildings and complex structures make the air flow turbulent in urban cities, large-scale CFD simulations are needed. To this end, a CFD code based on a Lattice Boltzmann Method (LBM) with a block-based Adaptive Mesh Refinement (AMR) method is developed. As the conventional LBM based on a single relaxation time collision operator often becomes numerically unstable at high Reynolds number, we apply a state-of-the-art cumulant collision operator. The code is developed on a GPU cluster at JAEA. By using new functions in CUDA8.0, the GPU kernel functions are tuned to achieve high performance on the latest Pascal GPU architecture. By introducing a temporal blocking technique, we achieve a high performance of 488 MLUPS per a GPU, and the number of the MPI communications is significantly reduced.

Oral presentation

Real time plume dispersion simulation of lattice Boltzmann method

Onodera, Naoyuki

no journal, ,

The SPEEDI and its world version (WSPEEDI) were developed to predict the off-site diffusion behavior of radioactive substances covering wide areas at ~100km scale based on a mesoscale metrological model. In this work, we apply two new ingredients, GPUs and an adaptive mesh refinement (AMR) method to the lattice Boltzmann method (LBM). In this report, we confirmed the good scalability on the GPU-rich supercomputer, and our code can reproduce the wind tunnel experiment. We conclude that the present LBM is one of most promising approaches to realize a real-time simulation.

Oral presentation

Plume dispersion simulation using lattice Boltzmann method in urban area

Onodera, Naoyuki

no journal, ,

The simulation for dissipation of radioactive substances attract high social interest, and it is required to satisfy both the rapidity and the accuracy. To perform a real-time simulation with high resolution mesh for the scale of human living area such as alleyways and buildings, it is required to develop simulation schemes which can fully utilize high computational performance. In this study, we introduced a nudging-based data assimilation method into the lattice Boltzmann method (LBM), so that we can performe plume dissipation simulations for urban area.